Towards optimal packed string matching

نویسندگان

Oren Ben-Kiki

Philip Bille

Dany Breslauer

Leszek Gasieniec

Roberto Grossi

Oren Weimann

چکیده

In the packed string matching problem, it is assumed that each machine word can accommodate up to α characters, thus an n-character string occupies n/α memory words. (a) We extend the Crochemore-Perrin constant-spaceO(n)-time string matching algorithm to run in optimal O(n/α) time and even in real-time, achieving a factor α speedup over traditional algorithms that examine each character individually. Our macro-level algorithm only uses the standard AC instructions of the word-RAM model (i.e. no integer multiplication) plus two specialized micro-level AC word-size packed string instructions. The main word-size stringmatching instruction wssm is available in contemporary commodity processors. The other word-size maximum-suffix instruction wslm is only required during the pattern preprocessing. Benchmarks show that our solution can be efficiently implemented, unlike some prior theoretical packed string matching work. (b) We also consider the complexity of the packed string matching problem in the classical word-RAM model in the absence of the specialized micro-level instructions wssm and wslm. We propose micro-level algorithms for the theoretically efficient emulation using parallel algorithms techniques to emulate wssm and using the Four-Russians technique to emulate wslm. Surprisingly, our bit-parallel emulation of wssm also leads to a new simplified parallel random access machine string matching algorithm. As a byproduct to facilitate our results we develop a new algorithm for finding the leftmost (most significant) 1 bits in consecutive non-overlapping blocks of uniform size inside a word. This latter problem is not known to be reducible to finding the rightmost 1, which can be easily solved, since we do not know how to reverse the bits of a word in O(1) time. Interestingly, the pattern pre-processing steps of our macro-level algorithm, our micro-level algorithm, and our parallel random access machine algorithm are all less efficient than their corresponding text processing, leaving gaps on the complexities of these three problems. ∗Preliminary versions of this work were presented at FSTTCS 2011 [9] and at CPM 2012 [16]. †Intel Research and Development Center, Haifa, Israel. ‡Technical University of Denmark, Copenhagen, Denmark. §Caesarea Rothschild Institute for Interdisciplinary Applications of Computer Science, University of Haifa, Haifa, Israel. Partially supported by the European Research Council (ERC) Project SFEROT and by the Israeli Science Foundation Grants 686/07, 347/09 and 864/11. ¶University of Liverpool, Liverpool, United Kingdom. ‖Dipartimento di Informatica, Università di Pisa, Pisa, Italy. Partially supported by Italian project PRIN AlgoDEEP (2008TFBWL4) of MIUR. ∗∗Computer Science Department, University of Haifa, Haifa, Israel.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal Packed String Matching

In the packed string matching problem, each machine word accommodates α characters, thus an n-character text occupies n/α memory words. We extend the Crochemore-Perrin constantspace O(n)-time string matching algorithm to run in optimal O(n/α) time and even in real-time, achieving a factor α speedup over traditional algorithms that examine each character individually. Our solution can be efficie...

متن کامل

Fast Searching in Packed Strings

Given strings P and Q the (exact) string matching problem is to find all positions of substrings in Q matching P . The classical Knuth-Morris-Pratt algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear time which is optimal if we can only read one character at the time. However, most strings are stored in a computer in a packed representation with several characters in ...

متن کامل

Fast Packed String Matching for Short Patterns

Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. In the last two decades a general trend has appeared trying to exploit the power of the word RAM model to speed-up the performances of classical string matching algorithms. In ...

متن کامل

Tighter Packed Bit-Parallel NFA for Approximate String Matching

We propose a new variant of the bit-parallel NFA of Baeza-Yates and Navarro (BPD) for approximate string matching [1]. Given a length-m pattern and an error threshold k, the original BPD uses (m−k)(k +2) bits of space. We decrease this to (m− k)(k +1), and also give a slightly more efficient simulation algorithm for the NFA. In experiments our modified NFA is often noticeably more efficient tha...

متن کامل

Average-optimal string matching

The exact string matching problem is to find the occurrences of a pattern of length m from a text of length n symbols. We develop a novel and unorthodox filtering technique for this problem. Our method is based on transforming the problem into multiple matching of carefully chosen pattern subsequences. While this is seemingly more difficult than the original problem, we show that the idea leads...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Theor. Comput. Sci.

دوره 525 شماره

صفحات -

تاریخ انتشار 2014

Towards optimal packed string matching

نویسندگان

چکیده

منابع مشابه

Optimal Packed String Matching

Fast Searching in Packed Strings

Fast Packed String Matching for Short Patterns

Tighter Packed Bit-Parallel NFA for Approximate String Matching

Average-optimal string matching

عنوان ژورنال:

اشتراک گذاری